Overview

Dataset Statistics

Number of Variables 21
Number of Rows 21597
Missing Cells 6281
Missing Cells (%) 1.4%
Duplicate Rows 0
Duplicate Rows (%) 0.0%
Total Size in Memory 5.7 MB
Average Row Size in Memory 278.7 B
Variable Types
  • Numerical: 15
  • Categorical: 6

Dataset Insights

sqft_lot and sqft_lot15 have similar distributions Similar Distribution
waterfront has 2376 (11.0%) missing values Missing
yr_renovated has 3842 (17.79%) missing values Missing
price is skewed Skewed
bedrooms is skewed Skewed
bathrooms is skewed Skewed
sqft_living is skewed Skewed
sqft_lot is skewed Skewed
grade is skewed Skewed
sqft_above is skewed Skewed
yr_renovated is skewed Skewed
sqft_lot15 is skewed Skewed
date has a high cardinality: 372 distinct values High Cardinality
sqft_basement has a high cardinality: 304 distinct values High Cardinality
floors has constant length 3 Constant Length
waterfront has constant length 3 Constant Length
view has constant length 3 Constant Length
condition has constant length 1 Constant Length
long has 21597 (100.0%) negatives Negatives
yr_renovated has 17011 (78.77%) zeros Zeros
  • 1
  • 2

Variables

id

numerical

Approximate Distinct Count 21420
Approximate Unique (%) 99.2%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 337.5 KB
Mean 4.5805e+09
Minimum 1.0001e+06
Maximum 9.9e+09
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • id is skewed right (γ1 = 0.2432)

Quantile Statistics

Minimum 1.0001e+06
5-th Percentile 5.1274e+08
Q1 2.123e+09
Median 3.9049e+09
Q3 7.3089e+09
95-th Percentile 9.2973e+09
Maximum 9.9e+09
Range 9.899e+09
IQR 5.1859e+09

Descriptive Statistics

Mean 4.5805e+09
Standard Deviation 2.8767e+09
Variance 8.2756e+18
Sum 9.8925e+13
Skewness 0.2432
Kurtosis -1.2607
Coefficient of Variation 0.628

date

categorical

Approximate Distinct Count 372
Approximate Unique (%) 1.7%
Missing 0
Missing (%) 0.0%
Memory Size 1.5 MB

Length

Mean 8.9244
Standard Deviation 0.6098
Median 9
Minimum 8
Maximum 10

Sample

1st row 10/13/2014
2nd row 12/9/2014
3rd row 2/25/2015
4th row 12/9/2014
5th row 2/18/2015

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 149547

price

numerical

Approximate Distinct Count 3622
Approximate Unique (%) 16.8%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 337.5 KB
Mean 540296.5735
Minimum 78000
Maximum 7.7e+06
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • price is skewed right (γ1 = 4.0231)

Quantile Statistics

Minimum 78000
5-th Percentile 210000
Q1 322000
Median 450000
Q3 645000
95-th Percentile 1.16e+06
Maximum 7.7e+06
Range 7.622e+06
IQR 323000

Descriptive Statistics

Mean 540296.5735
Standard Deviation 367368.1401
Variance 1.3496e+11
Sum 1.1669e+10
Skewness 4.0231
Kurtosis 34.5331
Coefficient of Variation 0.6799
  • price is not normally distributed (p-value 1.4090085183459613e-14)
  • price has 1158 outliers

bedrooms

numerical

Approximate Distinct Count 12
Approximate Unique (%) 0.1%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 337.5 KB
Mean 3.3732
Minimum 1
Maximum 33
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • bedrooms is skewed right (γ1 = 2.0235)

Quantile Statistics

Minimum 1
5-th Percentile 2
Q1 3
Median 3
Q3 4
95-th Percentile 5
Maximum 33
Range 32
IQR 1

Descriptive Statistics

Mean 3.3732
Standard Deviation 0.9263
Variance 0.858
Sum 72851
Skewness 2.0235
Kurtosis 49.81
Coefficient of Variation 0.2746
  • bedrooms is not normally distributed (p-value 3.0190749210869117e-18)
  • bedrooms has 530 outliers

bathrooms

numerical

Approximate Distinct Count 29
Approximate Unique (%) 0.1%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 337.5 KB
Mean 2.1158
Minimum 0.5
Maximum 8
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • bathrooms is skewed right (γ1 = 0.5197)

Quantile Statistics

Minimum 0.5
5-th Percentile 1
Q1 1.75
Median 2.25
Q3 2.5
95-th Percentile 3.5
Maximum 8
Range 7.5
IQR 0.75

Descriptive Statistics

Mean 2.1158
Standard Deviation 0.769
Variance 0.5913
Sum 45695.5
Skewness 0.5197
Kurtosis 1.2787
Coefficient of Variation 0.3634
  • bathrooms is not normally distributed (p-value 8.461816315923516e-13)
  • bathrooms has 561 outliers

sqft_living

numerical

Approximate Distinct Count 1034
Approximate Unique (%) 4.8%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 337.5 KB
Mean 2080.3219
Minimum 370
Maximum 13540
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • sqft_living is skewed right (γ1 = 1.4731)

Quantile Statistics

Minimum 370
5-th Percentile 940
Q1 1430
Median 1910
Q3 2550
95-th Percentile 3760
Maximum 13540
Range 13170
IQR 1120

Descriptive Statistics

Mean 2080.3219
Standard Deviation 918.1061
Variance 842918.8569
Sum 4.4929e+07
Skewness 1.4731
Kurtosis 5.2506
Coefficient of Variation 0.4413
  • sqft_living is not normally distributed (p-value 9.429759774292558e-07)
  • sqft_living has 571 outliers

sqft_lot

numerical

Approximate Distinct Count 9776
Approximate Unique (%) 45.3%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 337.5 KB
Mean 15099.4088
Minimum 520
Maximum 1.6514e+06
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • sqft_lot is skewed right (γ1 = 13.0717)

Quantile Statistics

Minimum 520
5-th Percentile 1800.8
Q1 5040
Median 7618
Q3 10685
95-th Percentile 43307.2
Maximum 1.6514e+06
Range 1.6508e+06
IQR 5645

Descriptive Statistics

Mean 15099.4088
Standard Deviation 41412.6369
Variance 1.715e+09
Sum 3.261e+08
Skewness 13.0717
Kurtosis 285.4294
Coefficient of Variation 2.7427
  • sqft_lot is not normally distributed (p-value 4.7506097995093565e-25)
  • sqft_lot has 2419 outliers

floors

categorical

Approximate Distinct Count 6
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 1.4 MB

Length

Mean 3
Standard Deviation 0
Median 3
Minimum 3
Maximum 3

Sample

1st row 1.0
2nd row 2.0
3rd row 1.0
4th row 1.0
5th row 1.0

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 43194
  • The top 2 categories (1.0, 2.0) take over 50.0%
  • floors has words of constant length

waterfront

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.0%
Missing 2376
Missing (%) 11.0%
Memory Size 1.2 MB
  • The largest value (0.0) is over 130.65 times larger than the second largest value (1.0)

Length

Mean 3
Standard Deviation 0
Median 3
Minimum 3
Maximum 3

Sample

1st row 0.0
2nd row 0.0
3rd row 0.0
4th row 0.0
5th row 0.0

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 38442
  • The top 2 categories (0.0, 1.0) take over 50.0%
  • The largest value (00) is over 130.65 times larger than the second largest value (10)
  • waterfront has words of constant length

view

categorical

Approximate Distinct Count 5
Approximate Unique (%) 0.0%
Missing 63
Missing (%) 0.3%
Memory Size 1.4 MB
  • The largest value (0.0) is over 20.29 times larger than the second largest value (2.0)

Length

Mean 3
Standard Deviation 0
Median 3
Minimum 3
Maximum 3

Sample

1st row 0.0
2nd row 0.0
3rd row 0.0
4th row 0.0
5th row 0.0

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 43068
  • The top 2 categories (0.0, 2.0) take over 50.0%
  • The largest value (00) is over 20.29 times larger than the second largest value (20)
  • view has words of constant length

condition

categorical

Approximate Distinct Count 5
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 1.4 MB
  • The largest value (3) is over 2.47 times larger than the second largest value (4)

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row 3
2nd row 3
3rd row 3
4th row 5
5th row 3

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 21597
  • The top 2 categories (3, 4) take over 50.0%
  • The largest value (3) is over 2.47 times larger than the second largest value (4)
  • condition has words of constant length

grade

numerical

Approximate Distinct Count 11
Approximate Unique (%) 0.1%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 337.5 KB
Mean 7.6579
Minimum 3
Maximum 13
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • grade is skewed right (γ1 = 0.7882)

Quantile Statistics

Minimum 3
5-th Percentile 6
Q1 7
Median 7
Q3 8
95-th Percentile 10
Maximum 13
Range 10
IQR 1

Descriptive Statistics

Mean 7.6579
Standard Deviation 1.1732
Variance 1.3764
Sum 165388
Skewness 0.7882
Kurtosis 1.1346
Coefficient of Variation 0.1532
  • grade is not normally distributed (p-value 8.258779444436106e-18)
  • grade has 1905 outliers

sqft_above

numerical

Approximate Distinct Count 942
Approximate Unique (%) 4.4%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 337.5 KB
Mean 1788.5968
Minimum 370
Maximum 9410
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • sqft_above is skewed right (γ1 = 1.4473)

Quantile Statistics

Minimum 370
5-th Percentile 850
Q1 1190
Median 1560
Q3 2210
95-th Percentile 3400
Maximum 9410
Range 9040
IQR 1020

Descriptive Statistics

Mean 1788.5968
Standard Deviation 827.7598
Variance 685186.2222
Sum 3.8628e+07
Skewness 1.4473
Kurtosis 3.4045
Coefficient of Variation 0.4628
  • sqft_above is not normally distributed (p-value 9.228192888552119e-07)
  • sqft_above has 610 outliers

sqft_basement

categorical

Approximate Distinct Count 304
Approximate Unique (%) 1.4%
Missing 0
Missing (%) 0.0%
Memory Size 1.4 MB
  • The largest value (0.0) is over 28.25 times larger than the second largest value (?)

Length

Mean 3.816
Standard Deviation 1.1854
Median 3
Minimum 1
Maximum 6

Sample

1st row 0.0
2nd row 400.0
3rd row 0.0
4th row 910.0
5th row 0.0

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 60818
  • The top 2 categories (0.0, ?) take over 50.0%
  • The largest value (00) is over 59.11 times larger than the second largest value (6000)

yr_built

numerical

Approximate Distinct Count 116
Approximate Unique (%) 0.5%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 337.5 KB
Mean 1970.9997
Minimum 1900
Maximum 2015
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • yr_built is skewed left (γ1 = -0.4694)

Quantile Statistics

Minimum 1900
5-th Percentile 1915
Q1 1951
Median 1975
Q3 1997
95-th Percentile 2011
Maximum 2015
Range 115
IQR 46

Descriptive Statistics

Mean 1970.9997
Standard Deviation 29.3752
Variance 862.9044
Sum 4.2568e+07
Skewness -0.4694
Kurtosis -0.6578
Coefficient of Variation 0.0149

yr_renovated

numerical

Approximate Distinct Count 70
Approximate Unique (%) 0.4%
Missing 3842
Missing (%) 17.8%
Infinite 0
Infinite (%) 0.0%
Memory Size 277.4 KB
Mean 83.6368
Minimum 0
Maximum 2015
Zeros 17011
Zeros (%) 78.8%
Negatives 0
Negatives (%) 0.0%
  • yr_renovated is skewed right (γ1 = 4.573)

Quantile Statistics

Minimum 0
5-th Percentile 0
Q1 0
Median 0
Q3 0
95-th Percentile 0
Maximum 2015
Range 2015
IQR 0

Descriptive Statistics

Mean 83.6368
Standard Deviation 399.9464
Variance 159957.134
Sum 1.485e+06
Skewness 4.573
Kurtosis 18.9139
Coefficient of Variation 4.7819
  • yr_renovated is not normally distributed (p-value 4.59971423304225e-25)
  • yr_renovated has 744 outliers

zipcode

numerical

Approximate Distinct Count 70
Approximate Unique (%) 0.3%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 337.5 KB
Mean 98077.9518
Minimum 98001
Maximum 98199
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • zipcode is skewed right (γ1 = 0.4053)

Quantile Statistics

Minimum 98001
5-th Percentile 98004
Q1 98033
Median 98065
Q3 98118
95-th Percentile 98177
Maximum 98199
Range 198
IQR 85

Descriptive Statistics

Mean 98077.9518
Standard Deviation 53.5131
Variance 2863.6489
Sum 2.1182e+09
Skewness 0.4053
Kurtosis -0.8541
Coefficient of Variation 0.00054562

lat

numerical

Approximate Distinct Count 5033
Approximate Unique (%) 23.3%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 337.5 KB
Mean 47.5601
Minimum 47.1559
Maximum 47.7776
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • lat is skewed left (γ1 = -0.4855)

Quantile Statistics

Minimum 47.1559
5-th Percentile 47.3103
Q1 47.4711
Median 47.5718
Q3 47.678
95-th Percentile 47.7497
Maximum 47.7776
Range 0.6217
IQR 0.2069

Descriptive Statistics

Mean 47.5601
Standard Deviation 0.1386
Variance 0.0192
Sum 1.0272e+06
Skewness -0.4855
Kurtosis -0.6759
Coefficient of Variation 0.002913
  • lat has 2 outliers

long

numerical

Approximate Distinct Count 751
Approximate Unique (%) 3.5%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 337.5 KB
Mean -122.214
Minimum -122.519
Maximum -121.315
Zeros 0
Zeros (%) 0.0%
Negatives 21597
Negatives (%) 100.0%
  • long is skewed right (γ1 = 0.8848)

Quantile Statistics

Minimum -122.519
5-th Percentile -122.387
Q1 -122.328
Median -122.231
Q3 -122.125
95-th Percentile -121.9798
Maximum -121.315
Range 1.204
IQR 0.203

Descriptive Statistics

Mean -122.214
Standard Deviation 0.1407
Variance 0.0198
Sum -2.6395e+06
Skewness 0.8848
Kurtosis 1.0516
Coefficient of Variation -0.001151
  • long is not normally distributed (p-value 0.002577721040230141)
  • long has 255 outliers

sqft_living15

numerical

Approximate Distinct Count 777
Approximate Unique (%) 3.6%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 337.5 KB
Mean 1986.6203
Minimum 399
Maximum 6210
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • sqft_living15 is skewed right (γ1 = 1.1068)

Quantile Statistics

Minimum 399
5-th Percentile 1140
Q1 1490
Median 1840
Q3 2360
95-th Percentile 3300
Maximum 6210
Range 5811
IQR 870

Descriptive Statistics

Mean 1986.6203
Standard Deviation 685.2305
Variance 469540.7996
Sum 4.2905e+07
Skewness 1.1068
Kurtosis 1.5911
Coefficient of Variation 0.3449
  • sqft_living15 is not normally distributed (p-value 0.002315067825740229)
  • sqft_living15 has 543 outliers

sqft_lot15

numerical

Approximate Distinct Count 8682
Approximate Unique (%) 40.2%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 337.5 KB
Mean 12758.2835
Minimum 651
Maximum 871200
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • sqft_lot15 is skewed right (γ1 = 9.5237)

Quantile Statistics

Minimum 651
5-th Percentile 2002.4
Q1 5100
Median 7620
Q3 10083
95-th Percentile 37045.2
Maximum 871200
Range 870549
IQR 4983

Descriptive Statistics

Mean 12758.2835
Standard Deviation 27274.442
Variance 7.439e+08
Sum 2.7554e+08
Skewness 9.5237
Kurtosis 151.3603
Coefficient of Variation 2.1378
  • sqft_lot15 is not normally distributed (p-value 4.987260523157603e-25)
  • sqft_lot15 has 2188 outliers

Interactions

Correlations

Missing Values